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Abstract 

There exist two different versions of the Kullback-Leibler divergence (K-Ld) in Tsal- 
lis statistics, namely the usual generalized K-Ld and the generalized Bregman K-Ld. 
Problems have been encountered in trying to reconcile them. A condition for consis- 
tency between these two generalized K-Ld-forms by recourse to the additive duality 
of Tsallis statistics is derived. It is also shown that the usual generalized K-Ld 
subjected to this additive duality, known as the dual generalized K-Ld, is a scaled 
Bregman divergence. This leads to an interesting conclusion: the dual generalized 
mutual information is a scaled Bregman information. The utility and implications 
of these results are discussed. 
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1 Introduction 



The generalized statistics of Tsallis' has recently been the focus of much at- 
tention in statistical physics, complex systems, and allied disciplines (in this 
paper the terms generalized statistics, nonadditive statistics, and nonextensive 
statistics are indistinctly used)[l]. It is well-known that nonadditive statis- 
tics generalizes the extensive Boltzmann-Gibbs-Shannon (B-G-S) statistics. 
Its scope has lately been extended to studies of lossy data compression in 
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communication theory [2] and machine learning [3,4]. In this paper, attention 
is focussed upon the Tsallls-generalization of the concept of relative entropy, 
also known as Kullback-Leibler divergence (K-Ld), that constitutes a funda- 
mental distance- measure in information theory [5]. The generalized K-Ld [6] 
encountered in deformed statistics has been described by Naudts [7] both as a 
special form of f-divergences [8, 9], and also in terms of Bregman- divergences 
[10]. Bregman divergences are, in turn, information geometric tools that have 
lately acquired great significance in a variety of disciplines ranging from in- 
formation retrieval [11] and lossy data compression-machine learning [12] to 
statistical physics [13]. 



The generalized K-Ld is defined as [7] 

D<t>(p\\r) = -Y,PtU<p (J) = 



— 1 -1 



(1) 



where p is an arbitrary distribution, r is the reference distribution, and k is 
some nonadditivity parameter satisfying —1 < k < 1;k^0. Here (1) employs 
the definition of the so-called deduced logarithm [7] 



= 1(1- or") . (2) 



K 



An alternate form of the generalized K-Ld derived from the theory of Bregman 
divergences [7] is shown to be 

Df {p\\ r) = (r) - (p) - S (Pi " n) In, in) , (3) 

i 

where the generalized entropy and the deformed logarithm are defined as 

M*) = X> w *(t)> ( 4 ) 

and 

respectively. 



ln (x) = (l + /sT 1 ) (x K - 1) , (5) 



1.1 Problems reconciling the Tsallis versions of the Kullback-Leibler diver- 
gence 



Specializing the above concepts to the Tsallis scenario by setting k — q — 1, 
Eqs. (1) and (3) yield the usual doubly convex generalized K-Ld [6] 
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" i 

and the generalized Bregman K-Ld 
respectively. 



(6) 



(7) 



While the form of the generalized Bregman K-Ld (BK-Ld) is more appealing 
than (6) from an information geometric viewpoint, it does contain certain 
inherent drawbacks. 



A study by Abe and Bagci [13] has demonstrated that the generalized K-Ld 
defined by (6) is jointly convex in terms of both pi and while the form 
defined by (7) is convex only in terms of pi. A further distinction between 
the two forms of the generalized K-Ld concerns the property of composability. 
While the form defined by (6) is composable, the form defined by (7) does not 
exhibit this property. The fact that the two generalized K-Ld versions have no 
apparent relation to each other should be a cause of concern for practitioners 
of nonextensive statistical physics. 

A second issue to address concerns the manner in which mean values are 
computed. Nonextensive statistics has employed a number of forms in which 
expectations may be defined. Prominent among these are the linear con- 
straints originally employed by Tsallis [1] (also known as normal averages) 
of the form: (A) = Y^PiM-, the Curado-Tsallis (C-T) constraints [14] of the 

i 

form: (A) = T,PiA , and the normalized Tsallis-Mendes-Plastino (TMP) 

i 

constraints [15] (also known as g-averages) of the form: ((A)) = y^hr A4 . 

i 

A fourth constraining procedure is the optimal Lagrange multiplier (OLM) 
approach [16]. Of these four methods to describe expectations, the most com- 
monly employed by Tsallis-practitioners is the TMP-one. 

Recent works by Abe [17, 18] suggest that in generalized statistics expecta- 
tions defined in terms of normal averages, in contrast to those defined by 
g-averages, are consistent with the generalized H-theorem and the generalized 
Stosszahlansatz (molecular chaos hypothesis). The correctness of normal aver- 
age expectations vis-a-vis g-average (or TMP) ones has also been investigated 
by Hasegawa [19, 20]. Understandably, a re- formulation of the variational per- 
turbation approximations in nonextensive statistical physics followed [21], via 
an application of g-deformed calculus [22]. 

Further concern is originated by a consistency issue. This stems from the 
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fact that the form of the generalized K-Ld defined by (6) is consistent with 
expectations and constraints defined by (/-averages while, on the other hand, 
the generalized Bregman K-Ld defined by (7) is consistent with expectations 
defined by normal averages [13]. 

1.2 Additive duality 

The additive duality is a fundamental property in generalized statistics. One 
implication of the additive duality is that it permits a deformed logarithm 
defined by a given nonadditivity parameter (say, q) to be inferred from its 
dual deformed logarithm [1, 2, 23] parameterized by: q* = 2 — q. 

Our leitmotif for invoking the additive duality stems from the form of the 
BK-Ld (7). Setting k = q — 1 in (2) and (5) yields a Tsallis entropy of the 
form: S q (z) = —zln q (z), which is the Tsallis entropy defined in Section 2.1 
of this paper subjected to the re-parameterization q -»■ 2 - Thus, in 
the Tsallis scenario, (5) is actually the dual Tsallis entropy defined in (15) 
with the additive duality (q — y 2 — q) implicitly accounted for. Given these 
facts, from the definition of Bregman divergences provided by Definition 1 
in Section 2.3 below, the form of the BK-Ld (7) can only be obtained by 
specifying the complex generating function as: <f>(z) = z \n q z, followed by the 
re-parameterization q — > 2 — q. More specifically, the BK-Ld (7) can only 
be derived from first principles using (5) defined in the Tsallis scenario by 
recourse to the additive duality. Hence, the necessity for invoking the additive 
duality in this paper, where the re-parameterization is explicitly accounted for 
by defining: q* = 2 — q. 

By definition (see Section 2.1 below for details), the generalized K-Ld sub- 
jected to the additive duality is referred to as the dual generalized K-Ld having 
the form 

[p\\ r] = I> V gj) = Tr^yE (pt^rf- 1 - 1). (8) 

However, employing the definitions of Bregman divergences presented in Sec- 
tion 2.3 below, the BK-Ld is of the form 

D% [p\\ r] = j^—r J2 Pi ( P ]- q * - r]-i*) -J2(Pi~ r t ) r]~ q \ (9) 

for the convex generating function: <p(z) = z ln g * z. 



Here "— >■" denotes a re-parameterization of the nonadditivity parameter, and is 
not a limit. 
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1.3 Goal of this paper 



Scaled Bregman divergences, formally introduced by Stummer [24] and Stum- 
mer and Vajda [25], unify separable Bregman divergences [10] (denned below 
in Section 2.3) and f-divergences [8,9]. This paper uses scaled Bregman diver- 
gences as its basis, and accomplishes the following objectives: 

• (i) the generalized K-Ld defined by (6) subjected to the additive duality 
(dual generalized K-Ld (8) and (15)) is shown to be consistent with the 
canonical probability that maximizes the dual Tsallis entropy of the form [2, 
26]: S q * = —J2Pi^ n q* Pi employed in conjunction with expectations defined 

i 

by normal averages (Section 3 of this paper), 

• (ii) a correspondence between the dual generalized K-Ld and the generalized 
Bregman K-Ld is derived (Section 4 below), 

• (Hi) the dual generalized K-Ld is demonstrated to be a scaled Bregman 
divergence and that its expectation is a scaled Bregman information, i.e. 
the expectation of a scaled Bregman divergence (Section 5 below) for both 
regimes of the dual nonadditivity parameter < q* < 1 and q* > 1 [27] 
(Section 5 below). 

Section 6 is devoted to discussion and conclusions. The primary conclusion 
of this paper is the necessity of employing the dual generalized K-Ld when 
performing a minimum cross entropy analysis (principle of minimum discrim- 
ination information) of Kullback [28] and Kullback and Khairat [29] using 
constraints defined by normal average expectations. 



2 Theoretical preliminaries 



The essential concepts around which this communication revolves are reviewed 
in the three subsections that follow. 



2.1 Tsallis entropy and the additive duality 



By definition, the Tsallis entropy, is defined in terms of discrete variables as 
[1] 

S q (X) = f— ;£p(s) = l. ( 10 ) 

H X 

The constant q is referred to as the nonadditive parameter. Here, (10) implies 
that extensive B-G-S statistics is recovered as q — > 1. Taking the limit q — > 1 
in (10) and invoking l'Hospital's rule, S q (X) — > S (X), i.e., the Shannon 
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entropy. Nonextensive statistics is intimately related to q- deformed algebra 
and calculus (see [22] and the references within). The q-deformed logarithm 
and exponential are defined as [22] 



ln q (x) = 
and, 

exp (x) 



^-1-1 

1-9 ; 



[1 + (1 - q) x}—* ; 1 + (1 - q) x > 
0; otherwise, 



(11) 



respectively. In this respect, an important relation from q-deformed algebra is 
[2,22,27] 

ln,(|) =yi- 1 (\n q x-ln q y). (12) 

The Tsallis entropy (10), conditional Tsallis entropy, and, joint Tsallis entropy 
may be written as [1] 



s q ( x ) = ~ E V (xf \n q p (x) , 

X 

S q [X X) = -T,T,p(x,x) 9 \n q p(x\x), 

S q (x, x ) = - J2 E v i x , x ) q i 11 ? v ( x , x) 



(13) 



S q (X) + S q (X\X) = S q (X) + S q (X\X), 



respectively. 



This paper makes prominent use of the additive duality in nonextensive statis- 
tics. Setting q* = 2 — q, from (11) the dual deformed logarithm and exponential 
are defined as 



In,, (x) = - \n q (i) , and, exp 9 » (x) 



esq? J-x) ' 



(14) 



The dual Tsallis entropy, the dual conditional Tsallis entropy, the dual joint 
Tsallis entropy , and, the dual generalized K-Ld may thus be written as 



S q * (X) = -EP (x) \n q * p (x) , 

x 

S q * (X X) = -J2J2p(x,x)\n q , p(x\x), 
Sq* (X,X) = - E E P (x, x) ln 9 * p (x, x) 

V ' XX 

= Sq*{X) + S q *{X\X) = S q *{X) + S q *{X\X), 
and, 

D q L L \p{X) ||r(X)]=Ep(*)lrvf§, 



(15) 
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respectively. The dual Tsallis entropy has already been studied in a maxi- 
mum (Tsallis) entropy setting (for example, see Ref. [26]). Note that the dual 
Tsallis entropy acquires a form identical to the B-G-S entropies, with ln q * (•) 
replacing log(«). It is important to note that the q* = 2 — q duality has been 
studied within the Sharma-Taneja-Mittal framework by Kanniadakis, et. al. 
[30]. The dual Tsallis entropy has been demonstrated to support a paramet- 
rically extended information theory, as is defined in Theorem 2 below. 



Theorem 1 [2]: Let X 1 ,X 2 ,X 3 , ...,X n be random variables obeying the prob- 
ability distribution p(xi,x 2 ,x 3 , ...,x n ), then we have the chain rule 

n 

Sq* (Xi, X 2 , X 3 , X n ) — S q * (Xi\ Xi-i, Xi) . (16) 

1=1 



2.2 Generalized mutual informations 



Given a random variable X in X where instances of X are xi, x\x\, for 
< q < 1, the generalized mutual information is defined in terms of the 
generalized K-Ld [2] 

W, (X;X) = - J><*.*) In, (^f) ■ (17) 



For nonadditivity parameters in the range q > 1, the generalized mutual 
information is [2,27] 



i q (x-x) = s q {x)-s q [x\x) = s q (x) -s q (x 

= S q (X) + Sq (X) - S q (X, X) =I q (X;X);q>l 



X 



(18) 



For (18) to hold true, the inequalities (sub-additivities) 

S q [X\ X) < S q (X) , and, S q (x| x) < S q (x) , (19) 

have to hold true. This is not guaranteed for nonadditivity parameters in the 
range < q < 1 [2,27]. 

As stated in Refs. [2] and [27], the generalized mutual information is separately 
defined within two separate q— ranges < q < 1 and q > 1. They have 
different uses. For < q < 1, the generalized mutual information, as defined by 
(17), provides a means of extrapolating the Csiszar-Tusnady theory [31] to the 
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nonextensive domain for two convex sets of probability distributions [2] . This 
has important implications in communication theory and allied disciplines [2, 
5]. 

For q > 1, the generalized mutual information as defined by (18) possesses 
a number of important properties such as the generalized data processing in- 
equality and the generalized Fano inequality [27]. This allows one to define 
Lagrangians and cost functions for processes defined by a Markov chain rela- 
tion. 

Theorem 2 [2] The generalized mutual information for nonadditivity param- 
eters in the range < q < 1 and q > 1 are related via the additive duality 



V(*;^ = -EEp(M)MW) 

{9 *= q) S q (X) + S q (X) - S q (X, X) = I q (X; X);0<q*<l, and, q > 1. 

2.3 Bregman divergences and scaled Bregman divergences 

This sub-section introduces the formal definition of Bregman divergences and 
some of their select properties. The Bregman divergence or Bregman distance 
is similar to a metric, but does not in general satisfy the triangle inequality nor 
symmetry. Bregman divergences do however obey the Pythagorean theorem 
(for example, see Appendix A in [12]). There are two ways in which Bregman 
divergences are important. Firstly, they generalize squared Euclidean distances 
to a class of distances that all share similar properties. Secondly, they bear 
a strong connection to exponential families of distributions. There is a bijec- 
tion between regular exponential families and regular Bregman divergences. 
Bregman divergences are named after L. M. Bregman [10], who introduced the 
concept in 1967. More recently researchers in geometric algorithms have shown 
that many important algorithms can be generalized from Euclidean metrics 
to distances defined by Bregman divergence. This sub-section introduces the 
formal definition of Bregman divergences and some of their properties. 

Definition 1 (Bregman divergences) [10, 32]: Let be a real valued strictly 
convex function defined on the convex set S C dom(4>), the domain of </> such 
that (f) is different iable on ri(S), the relative interior of S. The Bregman di- 
vergence £>0 : S x ri (S) i— >■ [0, oo) is defined as: (z±, z 2 ) = <fi (z{) — 4>{z2) — 
(zi — Z2, V0 (-22)), where: V0 (z 2 ) is the gradient of evaluated at z 2 . LU 



2 Note that (•,•) denotes the inner product. Calligraphic fonts denote sets. 
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Definition 2 (Notations) [25]: M. denotes the space of all finite measures on a 
measurable space (X, A) and PcM the subspace of all probability measures. 
Unless otherwise explicitly stated P,R,M are mutually measure-theoretically 
equivalent measures on (X,A) dominated by a cr-finite measure A on (X,A). 
Then the densities 



dP dR , dM 
P = W r = W and > m = ll\> 



(21) 



have a common support which will be identified with X . Unless stated oth- 
erwise, it is assumed that P, R G V, M e M and that : (0, oo) >->■ 1Z is a 
continuous and convex function. 



Definition 3 (Scaled Bregman Divergences) [25] The Bregman divergence 
of probability measures P, R scaled by an arbitrary measure M on (X,A) 
measure-theoretically equivalent with P, R is defined by 



B, (P, R\M)=J X [^ (£) - 4> (£) - - V0 (£) 

= /* H fe) - ^ fe) - (p - r ) v< ^ rfA - 



dM 



(22) 



The convex may be interpreted as the generating function of the divergence. 
In a discrete setting, a scaled Bregman divergence is defined as [25] 



B^ (p,r\m) = 



i=l 



Pi 



in, 



rrii 



m- (23) 



3 Maximum dual Tsallis entropy models 



The Tsallis entropy parameterized by q is defined as [1] 

s q ]p] = 



Setting q* = 2 — q, the dual Tsallis entropy is expressed as [2, 26] 

5,. [p] = 



(24) 



(25) 



(q*-l) 

The g*-deformed Lagrangian (normal averages used) to be extremized reads 
$* [p, a, 0\ = S q * [p] - a - - /3 - , (26) 
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yielding the canonical probability that maximizes the dual Tsallis entropy as 



Pi = 



'i-UglriE-iri 



(27) 



Note that the methodology developed in [33] is employed in the maximum 
Tsallis analysis using constraints defined by normal averages. Here, Z((3*) is 
the canonical partition function. The Appendix in this paper provides the 
detailed derivation of (27). The dual Tsallis entropy is defined as 

Here, (3* = (3/(2 — q*) is referred to as the "dual scaled inverse thermodynamic 
temperature" . 



4 Correspondence between the generalized Kullback-Leibler diver- 
gences 



The generalized free energy (GFE) for normal averages expectations is defined 
as [21] 

F q = U-ls q \p\. (29) 

Note that the expression for the GFE (29) has recently been the object of 
much research and debate. The effective inverse temperature f3 is the energy 
Lagrange multiplier scaled with respect to q. The energy Lagrange multiplier 
generally relates to the thermodynamic temperature T as: (3 = -j^f, where ks 
is the Boltzmann constant (sometimes set to unity for the sake of convenience) 
only in the limiting case q — > 1. Prominent attempts to clarify this issue are 
those by Abe et. al. [34], Abe [35], amongst others. Similarly, the g*-deformed 
(dual) GFE is defined as 

F q * = U — ±Sq* [p] . (30) 



At this stage, setting the reference probability r = p in (9), and associat- 
ing the quantities S q , U, and, K g with the maximum Tsallis entropy canonical 
distribution p iy yields 

d% \p\\ p] = E Pi {p\~ q * - Pl~ qr ) - E (ft - p*~ q * ■ ( 31 ) 
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Substituting (27) into (31) one gets 



Df. [p \\P] = (xz-ry EPi (>V " (1 " q*) P*U - K q * + (1 - q*) 
-E(Pi-Pi) (*V + (l-g*)£* (£-£/)). 



(32) 



With the aid of (28) and (30) and the normalization property, (32) leads now 
to 



£>? [Pi lift 



and the dual generalized K-Ld defined in (15) becomes 
D q K- L \pi\\Pi}=EPiW (|) 



1-9 



(1-9*) 



/3* 



Fq* ~~ F q * 



v q * Eft = F 



Fq* — F q * 



=1-9* 



(33) 



(34) 



From (33) and (34), the correspondence relation between the usual generalized 
K-Ld, the dual generalized K-Ld, and the generalized Bregman K-Ld is 



Dft-L \Pi lift] = DC>K - L[nm q ~ q D -->* 



*9 ' (35) 

*q=Epl~ q 



i 



which is a compact result. 



It is important to point out that one application of the correspondence relation 
presented in this Section is that of providing an alternate means to derive 
the dual generalized K-Ld from the generalized Bregman K-Ld. This may be 
accomplished by invoking the linearity property of Bregman divergences (see 
Appendix A of Ref. [12]). 

The above mentioned linearity property states that the Bregman divergence 
is a linear operator i.e., Vx G S, y C ri(S) (where ri(») denotes the relative 
interior of a set), B C( p(x, y) = cB^(x, y) (for c > 0). From (35), it is immediately 
evident that multiplying (31) by: c = ^ q * > and invoking (12) readily yields 
the dual generalized K-Ld. This relation between the dual generalized K-Ld 
and Bregman divergences may however be viewed as one of convenience, which 
although tenable, lacks the formal theoretical rigor of the results presented in 
Section 5 below. 
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5 Dual generalized K-Ld, scaled Bregman divergences, and the 
scaled Bregman information 



This Section serves a two-fold purpose: (i) it is established that the dual gener- 
alized K-Ld defined in (15) is a scaled Bregman divergence, (ii) we introduce 
the concept of scaled Bregman information as the expectation of a scaled 
Bregman divergence. 



5. 1 Dual generalized K-Ld as a scaled Bregman divergence 



Let (i) t — — and (ii) the generating function of the Bregman divergence be 
a convex function <f>(t), with m the scaling. For a generating function <f>(t) = 
tln q *t, the discrete form of the scaled Bregman divergence (23) acquires the 
form 



Bs (p,r\m) = £ \*- In,. ^ - In,, - ( *l - n.) V In,. ( ^ 

V ' I / l-i [ rrii 1 mi mi y mi \ mi mi ) mi y \ m, 

= E [ft In,* £ - ft In,. ^ - (Pi - (^) ^ 
= E \Pimf _1 [In,, pi - In,* - (p* - n) (^) 



m, : . 



l-q* 



(36) 



At this point, specifying rrii = r^ in (36), and invoking (12) and the normaliza- 
tion relation: EPi = E r i = 1> the dual generalized K-Ld in (15) is recovered, 



i.e. 



B rj) (p,r\m = r) = ^p;ln,. ( — J. 



(37) 



This is a g*-deformed f-divergence and is consistent with the theory derived 
in Refs. [24] and [25], when extended to deformed statistics. The above result 
may also be employed in the case of the dual generalized K-Ld between a 
conditional probability and a marginal probability. Let X and Y be random 
variables in X and y respectively. Let the marginal discrete probability mea- 
sures be:{p (xj)}™ =1 and {p Q/j)}™^, respectively. In such circumstances, the 
dual generalized K-Ld reads 



D{- L ]p{Y\ Xi ) \\p(Y)] = J2p(y,\x i )\n, 



p(Vj\xi 
V (Vj) 



(38) 



and is indeed a scaled Bregman divergence with the scaling: p (yj). 
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5.2 Dual generalized K-Ld and the scaled Bregman information 

Definition 4 [36]: For any Bregman divergence (or scaled Bregman diver- 
gence) B^ : S x int («S) \-> K + and any random variable Z ~ w(z) (where w(z) 
is the probability measure associated with Z), z e Z C S, the Bregman 
information (or scaled Bregman information) which is a measure of the 
information in Z is defined as 

I^Z) =< B^Z, <Z>)>. (39) 

For example, let X be a random variable that takes values in X = {xi}™ =1 
following a probability measure p(x). Let /x = (X) = J2p( x i) x i, an d let B$ 

i 

be a Bregman divergence (or scaled Bregman divergence). Then the Bregman 
information (or scaled Bregman information) of X is defined as 

I 4> (X) = Y,p(x i )B < p(x i , t j,). (40) 



Consider a random variable Z x which takes values in the set of probability dis- 
tributions: Z x = {p(Y\ Xi)}™ =1 , following the marginal probability: {p (xi)}™ =1 
defined over this set. The expectation of Z x is 

fi = j2p(xi)p(Y\ Xi ) = j> fa, Y ) = p 00 • ( 41 ) 

i i 

Thus, from (38)- (41) the scaled Bregman information, which is the dual gen- 
eralized mutual information, may be defined as 

(X;Y) = Hp{x l )D^_ L [p{Y\x l ) \\p{Y)\ 

i 

= Ep(x l )E P (y,\x i )\n r p -^f (42) 
= D^ L [p(X,Y)\\p(X)p(Y)] = I^Z x ). 

Similarly, the relation: I q *(X; Y) = I$(Z y ) also holds true, when Z y is a ran- 
dom variable which takes values in the set of probability distributions: Z y = 
{p (X\ Vj)} 1 ^ =v following the marginal probability: {p (yj)} 1 ^ =1 defined over this 
set. In this case, the scaled Bregman divergence is:D q K _ L \p(X\yj) \\p(X)\ = 
J2p(xi\Uj) ln g * p ^^\ and the normal averages expectation is calculated with 
respect to: p{yj)- 
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For values: q* > 1, the scaled Bregman information acquires the form 



h{Z x ) = Y.p{x u y,)\n q *^ 

1,3 

-Epixiiy^piyjf*' 1 (hvp(?/j| Xi) - \n. q *p(yj)) 

1,3 

= J2p(x i ,y j )J2p(x i ) q ~ 1 p(y j \x i ) q (\n q * p (yj\ Xi) - \n. q * p (y j)) 



1,3 i 

= EEp(xi)p(y j \x i )J2p(x i ) g *~ 1 p(yj\xi) q (\n q * p (yj\ x^ - \n q *p(yj)) 

% j % 

= EEp(xi, yj) 9 * (hv p(yj\ Xi) - hv P (%)) 

* 3 

= - Ep(yj) q * hv p (yj) + Ep(xi, yj) 9 * \n q * p (yj\xi) 

j 1,3 

= -Y,p(xi) 9 \n q *p(xi) + Y,p(xi,yj) 9 ln q *p(xi\yj). 

i i,j 



(43) 

In the derivation (43) (a) denotes the use of (12) while (b) denotes setting: 
P{Vj) 9 _1 = J2p( x i) 9 1 p{yj\xi) 9 -1 . Note that the last two expressions in 

i 

(43) are identical to (18) with the nonadditivity parameter q* replacing q in 
(18). Defining 

S q * (X) = -Ep(x) 9 * ln q *p(x), 

X 

S q * [X X) = -Y,Y,p(x,x) 9 \n r p(x\x), 

v ' X X 

S q * (X, X) = - E E P (x, xf In,* p (x, x) (44) 

v y XX 

= S q *(X) + S q .(X\X) = S q *(X) + S q .(X\X), 



the scaled Bregman information (43) acquires the form 

1^ (Z x ) = S q * (Y) - S q * (Y\X) = S q * (X) - S q * (X\Y) = J (Z y ) , (45) 

where the inequalities: S q * (Y\ X) < S q * (Y), and, S q * (X\ Y) < S q * (X) hold 
true. 

Comparison of (13) and (44) readily reveals that the original expressions for 
the Tsallis entropy and conditional Tsallis entropy, and their equivalent forms 
derived from the dual generalized mutual information (43), are invariant under 
interchange of the nonadditivity parameters q and q* = 2 — q. While this is 
indeed an appealing observation, two points need to be noted: (i) the physics of 
the problem is defined by q and not q*, and, (ii) Eqs. (13) and (44) correspond 
to two separate physical conditions. 
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The generalized mutual information (18) is expressed in terms of (13) for 
q > 1. This corresponds to probability distributions of particular interest to 
Tsallis statistics, i.e. "long-tailed" and power law distributions, amongst oth- 
ers. On the other hand, (43)-(45) correspond to q* > 1 =>• q < 1. This regime 
is not of great interest in generalized statistics. Thus, when modeling problems 
in generalized statistics (for example, see Ref. [3]) whose variational princi- 
ple requires invoking the properties Bregman divergences, use of Theorem 2 
(Eq. (20)) is to be employed in order to simultaneously achieve information- 
geometric and physical consistency. 



6 Summary and Discussions 

Our present endeavors have enabled us to reach several findings regarding the 
Tsallis environment. 

• The dual generalized K-Ld was shown to be a scaled Bregman divergence. 

• With regards to expectation values computed using normal averages, the 
dual generalized mutual information was demonstrated to be a scaled Breg- 
man information, 

• The correspondence linking the dual generalized K-Ld, the generalized Breg- 
man K-Ld (for probability distributions which maximize the dual Tsallis 
entropy when using normal-averages-constraints), and the usual form of the 
generalized K-Ld, has been established. Such a correspondence has not been 
previously investigated in Tsallis statistics literature. 

From the analyses in Sections 3-5, it becomes obvious from a combined statis- 
tical physics plus information geometric perspective that the dual generalized 
K-Ld should also be employed as the measure of uncertainty when perform- 
ing a minimum cross entropy analysis (principle of minimum discrimination 
information) [28, 29, 37] for constraints that employ normal averages. 

A simpler justification stems from the fact that while in the orthodox B-G-S 
theory the K-Ld is a Bregman divergence [12], its Tsallis counterpart is not a 
Bregman divergence. Instead, as established in this paper, the dual generalized 
K-Ld is a scaled Bregman divergence. Future work uses the results derived 
herein to analyze: (i) the generalized statistics rate distortion theory [2], {ii) 
the generalized statistics information bottleneck method [3] within the context 
of scaled Bregman divergences and scaled Bregman informations, and (Hi) 
deformed statistics extensions of the minimum Bregman information principle 
and their applications in machine learning [36]. 
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Appendix A: Derivation of expression for the canonical probability 
which maximizes the dual Tsallis entropy 



From (26), the maximum dual Tsallis entropy Lagrangian is 

^•b»a.fl = -& ]n J'Pi- a (&- 1 j -p(^PiE-uj. (A.l) 



Employing the stationarity condition: s ^i*iP' a <^ _ q f or eac h p. anc ] the nor _ 
malization condition: Y^Pi — 1) yields 



.gz£J p ;-«-_^_ a = 



Pi 



(A.2) 



Employing the Ferri-Martinez-Plastino methodology [33], the normalization 
Lagrange multiplier a is obtained as follows. Multiplying the first equation in 
(A.2) by pi and summing over all indices i yields 



Substituting (A. 3) into (A.2) yields 



(A.3) 



Pi 



Pi = 



Xpf-^PiE-U) 



2-g* 



1-9* 



(A.4) 



Thus (27) is derived. 
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